An Exploration of Formalized Retrieval Heuristics
نویسندگان
چکیده
Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. Any effective retrieval formula, no matter how it is originally motivated, also often boils down to an explicit or implicit implementation of these heuristics. One basic research question is thus what are exactly these “necessary” heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of these retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is only satisfied for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well they satisfy these constraints. Thus the proposed constraints can provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.
منابع مشابه
An Exploration of Formalized Information Retrieval Heuristics
Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. Any effective retrieval formula, no matter how it is originally motivated, also often boils down to an explicit or implicit implementation of these heuristics. One basic research question is thus what are exactly these “...
متن کاملApplying Heuristics to Improve A Genetic Query Optimisation Process in Information Retrieval
This work presents a genetic approach for query optimisation in information retrieval. The proposed GA is improved y heuristics in order to solve the relevance multimodality problem and adapt the genetic exploration process to the information retrieval task. Experiments with AP documents and queries issued from TREC show the effectiveness of our GA model
متن کاملCritical Systems Heuristics (CSH) to Deal with Stakeholders' Contradictory Viewpoints of Iran Performance Based Budgeting System
Objective: Performance based budgeting is an undeniable necessity for effective management of the country vital resources nowadays, which benefits all economic and social layers of the society if properly implemented. Accordingly, this has encouraged lots of studies and researches on PPB theories, concepts and models. This study deeply reviewed Iran’s PBB system within four basic issues, includ...
متن کاملCoordinating Order Acceptance and Batch Delivery for an Integrated Supply Chain Scheduling
This paper develops Order Acceptance for an Integrated Production-Distribution Problem in which Batch Delivery is implemented. The aim of this problem is to coordinate: (1) rejecting some of the orders (2) production scheduling of the accepted orders and (3) batch delivery to maximize Total Net Profit. A Mixed Integer Programming is proposed for the problem. In addition, a hybrid meta-heuristic...
متن کاملMultiple query evaluation based on an enhanced genetic algorithm
Recent studies suggest that significant improvement in information retrieval performance can be achieved by combining multiple representations of an information need. The paper presents a genetic approach that combines the results from multiple query evaluations. The genetic algorithm aims to optimise the overall relevance estimate by exploring different directions of the document space. We inv...
متن کامل